A consistent clustering-based approach to 
estimating tlie number of cliange-points in liiglily 
cn dependent time-series 

o 

^ Azadeh Khaleghi Daniil Ryabko 

o azadeh.khaleghi@inria.fr daniil.ryabko@inria.fr 

(D 

[JLh INRIA, Lille Nord-Europe 

Abstract 

I 1 The problem of change-point estimation is considered under a general 

framework where the data are generated by unknown stationary ergodic 
process distributions. In this context, the consistent estimation of the 
number of change-points is provably impossible. However, it is shown 
that a consistent clustering method may be used to estimate the number 

, ^ , of change points, under the additional constraint that the correct number 

of process distributions that generate the data is provided. This additional 

y—( parameter has a natural interpretation in many real-world applications. 

^ An algorithm is proposed that estimates the number of change-points and 

locates the changes. The proposed algorithm is shown to be asymptoti- 
cally consistent; its empirical evaluations are provided. 

cn 

Csj 1 Introduction 

O 

m Change-point estimation is a classical problem in statistics and machine learn- 

ing, with applications in a broad range of domains, such as market analy- 
J> sis, bioinformatics, audio and video segmentation, fraud detection, only to 

name a few. The change-point problem may be described as follows. A se- 
quence X := Xi,...,Xn is composed of some (unknown) number k -|- 1 of 
non-overlapping segments. Each segment is generated by one of r (unknown) 
stochastic process distributions. The process distributions that generate every 
pair of consecutive segments are different. The index where one segment ends 
and another starts is called a change point. The change-points are unknown, 
and the objective is to estimate them given x. 

In this work we consider the change-point problem for highly dependent 
data, making as little assumptions as possible on how the data are generated. 
In particular, the distributions that generate the data are unknown and can be 
arbitrary; the only assumption is that they are stationary ergodic. This means 
that we make no such assumptions as independence, finite memory or mixing. 
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Moreover, we do not require the finite-dimensional marginals of any fixed size 
before and after the change points to be different. 

However, with no further assumptions or additional information, the estima- 
tion of the number of change-points is impossible even in the weakest asymp- 
totic sense. Indeed, as shown by Ryabko (2010&), it is impossible to distinguish 
even between the cases of and 1 change-point in this setting, even for binary 
sequences. As an alternative to imposing stronger assumptions on the distribu- 
tions that would allow for the estimation of the number of change points, we 
assume that the correct number r of the process distributions that generate x is 
provided as a parameter. 

This formulation is motivated by applications. Indeed, the assumption that 
the time-series data are highly dependent complies well with most real-world 
scenarios. Moreover, in many applications the number r of distributions is a 
natural parameter of the problem. For instance, the case of just r — 2 distribu- 
tions can be interpreted as normal versus abnormal behavior; one can imagine 
a sequence with many change-points in this scenario. Another application con- 
cerns the problem of author attribution in a given text written collaboratively 
by a known number r of authors. In speech segmentation r may be the total 
number of speakers. In video surveillance as well as in fraud detection, the 
change may refer to the point where normal activity becomes abnormal (r=2). 
The identification of coding versus non-coding regions in genomic data is yet 
another potential application. In other words, in many real-world applications 
the number r of process distributions comes with a natural interpretation. 

Main Results. We propose a nonparametric algorithm to estimate the 
number of change points and to locate the changes in time-series data. We 
demonstrate both theoretically and experimentally that our algorithm is asymp- 
totically consistent in the general framework described. A key observation 
we make is that given the total number r of process distributions, estimating 
the number of change-points is possible via a consistent time-series clustering 
method. We use a so-called list-estimator to generate an exhaustive list of 
change-point candidates. This induces a partitioning of the sequence into con- 
secutive segments. We then apply a simple clustering algorithm to group these 
segments into r clusters. The clustering procedure uses farthest-point initializa- 
tion to designate r cluster centers, and then assigns each remaining point to the 
nearest center. To measure the distance between the segments, empirical esti- 
mates of the so-called distributional distance [Gray| ( [19 88) are used (cf. Ryabko 



2010a). In each cluster, we identify the change-point candidate that joins a 



pair of consecutive segments as redundant. Finally, we remove the redundant 
estimates from the list and provide the remaining estimates as output. The 
consistency of the proposed method can be established using any list-estimator 
that is consistent under the considered framework, in combination with the 
time-series clustering algorithm mentioned above. An example of a consistent 



list-estimator is provided by Khaleghi and Ryabko (2012a). Thus, the proposed 



method establishes a new link between two classical unsupervised learning prob- 
lems: clustering and change-point analysis, potentially bringing a new insight 
to both communities. 
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Related Work. In a typical formulation of the change-point problem the 
samples within each segment are assumed to be generated i.i.d, the distribu- 
tions have known forms and the change is in the mean. In more general non- 
parametric settings, the form of the change and/or the nature of dependence 
are usually restricted. For example, the process distributions are assumed to 



be strongly mixing (|Brodsky and Darkhovsky[ 


1993 


Basseville and Nikiforov 


1993 


Giraitis et al.| 


1996 Hariz et al. 


2007 


Carlstein and Lele 


1993 


I, and 


the finite-dimensiona 


marginals are almost exclusively assumed to be different. 



The problem of estimating the number of change-points is nontrivial, even un- 
der these more restrictive assumptions. In such settings, this problem is usually 
addressed with penalized criteria; see, for example, (Lebarbier 2005 Lavielle 



2005 1. Such criteria necessarily rely on additional parameters, and the resulting 
number of change-points depends on these parameters. Note that the algo- 
rithm proposed in this work also requires an input parameter: the number r 
of distributions. However, this parameter has a natural interpretation in many 
real-world applications as discussed above. 

For the general framework considered in this work, the particular case of a 



known number k of change points has been considered in ( Ryabko and Ryabko 
2010[ | (k=1) and ( [Khaleghi and Ryabko) |20126[ ) (k > 1). However, if the number 



K of change-points provided to the algorithm is incorrect, the behavior of these 
algorithms can be arbitrarily bad. An intermediate solution for the case of 



unknown k in this general setting is given by Khaleghi and Ryabko (2012a I 
where a list estimator is proposed: a (sorted) list of possibly more than k 
candidate estimates is produced whose first k elements are consistent estimates 
of the change-points. The algorithms in these works, as well as in the present 
paper, are based on empirical estimates of distributional distance, which turns 
out to be a rather versatile tool for studying stationary ergodic time series. 

Organization. In Section [2] we introduce some preliminary notation and 
definitions. In Section |3] we formalize the problem. In Section [4] we present 
our algorithm and give an informal description and in Section [5] we prove the 
main consistency result. In Section [6] we present some experimental results and 
finally in Section [7] we provide our conclusions. 



2 Preliminaries 

Let A" be a measurable space (the domain); in this work we let A" = M but exten- 
sions to more general spaces are straightforward. For a sequence Xi, . . . , Xn we 
use the abbreviation Xi..„. Consider the Borel cr-algebra B on generated by 
the cylinders {B x X°° : B G B'^'^m, I GN}, where the sets B'^^^m, Z e N are 
obtained via the partitioning of X"^ into cubes of dimension m and volume 2~™' 
(starting at the origin). Let also B™ := UfgN^'"''- Process distributions are 
probability measures on the space {X°°,B). For x ~ Xi „ G A"" and B G i?™ 
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let v{x, B) denote the frequency with which x falls in B, i.e. 



n — r7z + 1 ■'^ — ^ 

z— 1 



(1) 



A process p is stationary if for any i,j S l..n and i? € i?™, m G N, we have 
p{Xi..j & B) = e B). A stationary process p is called ergodic if for 

all S e S with probability 1 we have lini„^oo v{Xi,,m B) = p{B). 

defn 1 (Distributional Distance) . The distributional distance between a pair of 



process distributions pi,p2 *5 defined as follows (see Gray 198S). 
d{pi,P2) 



OO 

E 

m.l — l 



WrnWl 



J2 \pi{B) - p2{B)l 



where we set Wj := l/k{k + 1) , but any summable sequence of positive weights 
may be used. 

In words, this involves partitioning the sets A"™, m e N into cubes of de- 
creasing volume (indexed by I) and then taking a sum over the differences in 
probabilities of all the cubes in these partitions. The differences in probabilities 
are weighted: smaller weights are given to larger m and finer partitions. We use 
empirical estimates of this distance defined as follows. 

defn 2 (Empirical estimates of d(-,-)). The empirical estimate of the distri- 
butional distance between a sequence x — Xi..„ £ X^^n G N and a process 
distribution p is given by 



dix,p):^ ^ Wm,i ^ \v{x,B) - p{B)\ 



(2) 



and that between a pair of sequences Xi G A""' G N, i — 1,2. is defined as 



d{xi,X2):= ^ w^m,i X! Wi^i^B) ~ iy{x2 



B) 



(3) 



B£B^ 



While the calculation of d{-, •) involves infinite summations it is fully tractable. 
Remark 1 (Calculating d{-, •)) Consider a pair of sequences Xi := X{ , . . . , Xn . G 
A"' with rii G N, i = 1, 2. Let Smin correspond to the partition where each cell 
B Cz B contains at most one point i.e. 



mm 

minjni .712} 



Indeed in ^ all summands corresponding to m > maxi=i^2 ni equal 0; more- 
over, all summa nds corresponding to I > Smin are equal. Thus as shown by 
Ryabko ( 2010a| even the most naive implementation of d{xi,X2) has com- 
putational complexity ©(n^ lognlogSmin) which may be further optimized to 
O(npolylogn), see (Khaleghi and Ryabko 2012a Ryabko[ 2010a| Khaleghi 
etaL|[2012| . 
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3 Problem Formulation 



We formalize the problem as follows. The sequence x := Xi, . . . , X„ e A"", n € 
N is formed as the concatenation of some unknown number k + 1 of sequences 

where Ok G (0,1), k = 1..k. Each of the sequences Xf. := Xn0^_^+i..n9k-i ^ = 
1..K + 1, 00 '■= 0, 1 is generated by one out of r < k + 1 unknown 

stationary ergodic process distributions pi, . . . Thus, there exists a ground- 
truth partitioning 

{^1,...,^.} (4) 
of the set {1..k + 1} into r disjoint subsets where for every k — 1..k + 1 and 
r' = l..r we have k G Qr' if and only if Xk is generated by p^'- The parameters 
0k, k = 1..K are called change-points since the indices n0k, k = 1..k separate 
consecutive segments Xk, Xk+i generated by different process distributions. The 
change-points are unknown, and our goal is to estimate them given the sequence 
X. The process distributions pi, . . . , pr are completely unknown and may even 
be dependent. Moreover, the means, variances, or more generally, the finite- 
dimensional marginal distributions of any fixed size before and after the change- 
points are not required to be different. We consider the most general scenario 
where the process distributions are different. Let the minimum separation of 
the change-points be defined as 

Amin min 0k - 0k-i- (5) 

fe = l..K + l 

Since the consistency properties we are after are asymptotic in n, we require 
that Ainin > 0. This is because if the length of one of the sequences is constant or 
sub-linear in n then asymptotic consistency is impossible in this setting. Note, 
however, that we do not make any assumptions on the distance between the 
process distributions (e.g., the distributional distance): they may be arbitrarily 
close. 

Since it is provably impossible 
case of one and zero change-points in this general framework, the number k 
of change-points cannot be estimated with no further information. Instead of 
making additional assumptions on the nature of the distributions generating the 
data, we assume that the total number r of distributions is provided (while the 
number k of change-points remains unknown) . 

Thus, the problem formulation we consider is as follows: given a sequence x, 
a lower-bound on the minimum separation of the change points A, and the total 
number of distributions r, it is required to find the number of changes k and 
estimate the change points tti , . . . , tt^ . A change-point estimator is a function 
that takes a sequence x G X"^ , n € N to produce a number k (estimated number 
of change points) and a set {0i{n), . . . ,0^{n)} C (0,1)"^ of estimated change 
points. It is asymptotically consistent if with probability 1 we have k = k from 
some n on and 

lim sup \dk{n) - 6*^1 = 0. 

n^oo k=l..K. 



Ryabko[ 20106) to distinguish between the 
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The algorithm we propose rehes on a so-called list- estimator, which is a pro- 
cedure that, given x and A, outputs a (long, exhaustive) list of change point 
estimates, without attempting to estimate the number of changes. More pre- 
cisely, we have the following definition. 

defn 3 (List-estimator). A list- estimator T is a function that, given a sequence 
X G X" and a number A G (0, 1), produces a set T{x, A) e UieN('^' some 
TO e N estimates T{x, A) {9i{n), . . . , 9m{n)\, that are at least A apart: 

inf mn)-e,{n)\>X 

where OqIu) :— 0, 9m+iin) := 1. 

Let x have change-points at least Xmin apart for some A,nin G (0, 1). A list- 
estimator T is said to be consistent if for every A € (0, A,„in) there is a subset 
{9f^-^(n), . . . ,9^^{n)} ofT{x,X) for some fii G l..m, i = such that with 
probability one we have 



lim sup \0^^{n) 



0. 



An example of a consistent list-estimator is provided in ( Khaleghi and Ryabko 



2012a). In particular we use the following statement. 



Proposition 1 (Khaleghi and Ryabko (2012a)). There exists a consistent list- 
estimator T. 



4 Main Result 

In this section we introduce an asymptotically consistent algorithm for esti- 
mating the number of change points and locating the changes. 

thm 1. Let x :— Xi..„ € A"", n £ N be a sequence with change-points at 
least Amin apart, for some Amin € (Oil)- ^ denote the total number of 

process distributions generating x. Then CluBChaPo{x, X,r) is asymptotically 
consistent for all X S (0, Amin] ■ 

The proof of Theorem [l] is deferred to Section [sj here we provide an intuitive 
explanation of how the algorithm works and why it is consistent. 
The algorithm works as follows. First, a (consistent) list-estimator is used to 
obtain an initial set of change-point candidates. The candidates are sorted in 
increasing order to produce a set S of consecutive non-overlapping segments of 
X. The set S is then partitioned into r clusters. In each cluster, the change- 
point candidate that joins a pair of consecutive segments of x is identified as 
redundant and is removed from the list. Once all of the redundant candidates are 
removed, the algorithm outputs the remaining change-point candidates. Next 
we give an intuitive explanation as to why the algorithm works. 

Since the list estimator T is consistent, from some n on an initial set of 
possibly more than k change-points are generated that is guaranteed to have a 
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Algorithm 1 Clustering-Based Cliange-Point (CluBCliaPo) Estimator 

input: X G A"", A £ (0, Amin], Number r of process distributions 

1. Obtain an initial (sorted) set of change-point candidates using a con- 
sistent list-estimator T (see Definition [s]): 

* <~ T{x, A) and let m ^ |*| 

{tpi : i — l..m} sort({?i^ : 9 G ^E"}), so that i < j < tpj, i,j G l..m. 

2. Generate a set S of consecutive segments: 

S {Xi := X^^_^+i,,^,. : i = l..m + 1, ■(/'o := 0, ^m+i := n} (6) 

3. Partition 5 into r clusters: 

Initialize r farthest segments as cluster centers: 

ci -h- 1, Cj ^ argmaxj^j j^mmd{xi,Xc.,), j = 2..r (7) 

Assign every segment to a cluster: 

T{x,) ^ argmin^^^ ^ d{xi,Xc^), i = l..m 

4. Eliminate redundant estimates: 

C ^ {l..m} 
for i = l..m do 

if T{x,) =Tix,+i) then 

C^C\ {i} 

end if 

end for 

\C\ 

return: k, {9i := ^ijji : i G C} 



subset of size k whose elements are arbitrarily close to the true change-points. 
Therefore, from some n on the largest portion of each segment in S is generated 
by a single process distribution. Since the initial change-point candidates are at 
least nX apart, the segments in S have lengths linear in n. Thus, we can show 
that from some n on the distance between a pair of segments in S converges 
to if and only if the same process distribution generates most of the two 
segments. Given the total number of process distributions, from some n on the 
clustering algorithm groups together those and only those segments in S that 
are generated by the same process distribution. This lets the algorithm identify 
and remove the redundant candidates. By the consistency of T the remaining 
estimates converge to the true change-points. 

As an example of a consistent list-estimator the method proposed by |Khale"gE 



and Ryabko (2012a) may be used. This algorithm outputs a list of estimates 
whose first k elements converge to the true change-points, provided that the 
parameter A satisfies A G (0, Amin]- Since k is unknown, all we can use here 
is that the correct change-point estimates are somewhere in the list. In gen- 
eral the algorithm may use any list-estimator that is consistent (in the sense 
of Definition |3| for stationary ergodic time series. In the proposed algorithm 
the following consistent clustering procedure is used. First, a total of r cluster 
centers are obtained as follows. The first segment xi is the first cluster center. 
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Through an iteration on j = 2..r a segment is chosen as a cluster center if it 
has the highest minimum distance from the previously chosen cluster centers. 
Once the cluster centers are specified, the remaining segments are assigned to 
the closest cluster. 

Remark 2 (Computational Complexity) In this implementation, an initial 



set of A ^ change-point candidates is obtained by the algorithm of Khaleghi and 



Ryabko (2012a| which as shown by the authors has complexity 0{n'^ poly log n) 



It is easy to see that the clustering procedure requires rX~^ pairwise distance 
calculations to partition the A"^ + 1 segments into r groups. By Remark 1, 
d{-,-) has computational complexity of ©(n poly log n). The remaining calcula- 
tions are of order 0{r{X^^ + 1)). This brings the resource complexity of the 
proposed algorithm to 0{n^ polylogn). 

5 Proof of Theorem [1] 

In this section we prove the consistency of the proposed algorithm. The proof 
relies on a Lemma [l] We introduce the following additional notation. Consider 
the set S of segments specified by Q in Algorithm [I] For every segment Xi := 
^i/>i_i..i/)i G S where i = l..m + 1 define pi as the process distribution that 
generates the largest portion of xf, that is, first define 

K := argmax \{ipi-i -f 1, . . . , n {nOk-i + 1, . . • ,n6'fe}| 

and then let pi := pj where j is such that K G Gj, and Qj, j — l..r are the 
ground-truth partitions defined by Q. 

lem 1. Let x G A"", n G N &e a sequence with k change-points at least Amin 
apart for some Amin G (0,1)- Assume that the distributions that generate x 
are stationary and ergodic. Let S be the set of segments specified by (|6| in 
Algorithm^ For all A G (0, Amin) with probability one we have 

lim sup d{xi,pi) = 
Proof. Fix an e G (0, A/2). There exists some T such that 

oo 

WmWi < e. (8) 

m,l=T 

Moreover, for every n > T/X and to G 1..T we have 

-^<e. (9) 
nX 

For simplicity of notation define tt^ := nOk, k = 1..k. Since the initial set 
of change-point candidates are produced by a consistent list-generator T (see 
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Definition [s]) , there exists an index-set X :— {/ii, . . . ,11^.} G {l..m}'^ and some 
Nq such that for all n > Nq we have 

sup -I'tl'f,^ - TTkl < e. (10) 

Moreover, the initial candidates are at least nX apart so that 

inf tpi - V'j-i > nX (11) 



where ipo :— and ipm+i Let X' := {l..m} \ I. By (10) and (11) for 

all n > Nq the candidates indexed by I' have linear distances from the true 
change-points. 

inf \nk-'ipi\> inf \ipi - tpjl - {iTk - > n{X ~ e) (12) 

Denote by Si := {xi := € S : {i, i — 1} Ci I = 0} the subset of the 

segments in S whose elements are formed by joining pairs of consecutive elements 
of I' and let ^2 5 \ Si be its complement. Let the true change-points that 
appear immediately to the left and to the right of an index j € l..n — 1 be given 

by 

J-'ii) ■= max TTfe < j and Tl(j) :~ min TTfe > j 
fceo..K+i feeo..K+i 

respectively, with ttq := 0, tt^+i := n where equality occurs when j is itself a 
change-point. 1. Consider Xi := X^._-^^i,,^. S Si. Observe that by definition Xi 
cannot contain a true change-point for n > A'o since otherwise either j — 1 or i 
would belong to I contradicting the assumption that Xi G Si . Therefore for all 
n > Nq we have pi — p where p G {pi, . . . , Pr} is the process distribution that 
generates ^£(^j_i)..K(^i_i). To show that d{xi,p) < e we proceed as follows. 
For each m, I e 1..N we can find a finite subset /?™'' of B™^' such that p{(3™'^) > 
1 — e. Observe that the segments X^(^^._^y i, have lengths at least An for all 
b S £{ipi-i) + nX..TZ{'4>i-i). Therefore, for every B S Z?™'', m,l gN there exists 
some N{B) such that for all n > N{B) with probability 1 we have 

sup W{Xci^^_^)..b,B)- p{B)\<e. (13) 

beC{jpi-i)+7iX..Tl(ipi-i) 

Using the definition of given by ([T]) we obtain the following algebraic 

manipulation of the frequency function. For every B G B"^'\ m,l G N we have 

/- D^ ipi - C{ipi-i) - m + 1 , 
H-.,B) = (14) 

ipi-i ~ C{Tjji-i) ~ m + 1 ^ I{^j..j+m e B} 
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where the last summation is upper bounded (in absolute value) by 



'(pi—tpi-i—'m-\-l ' 



Let N' 



max N{B), 

Be^J™.', m,l<£l..T 



}. For all n > N[ we have 



d{Xi,p)^ ^ Wmd ^ \v{xi,B) - p[B)\ 



m , / — 1 
T 



< J2 '^rn,lJ2 



ipt - - m + 1 m-1 

li'lx,, S) - + + 2e 

i)^ - ipi-i 



fpz - ^i-l 



(15) 



1 

^ - £(-0,-1) - m + 1 

< 2^ w^m,;^^ W{Xc{i,,-i)+i..^nB) - p{B)\ 



+ 



,1=1 Be/S"-' 

V-i-l - Cilpi-l) - TO + 1 



WiXc(^,_,)+i..i,,_„B)^ p{B)\ + 



(16) 
2(m- 1) 

ipi - i^i-i 

(17) 



where ([l5| follows from ([s]), the definition of /S™'' and the fact that •) 
pi-)\ < 1; ([lel follows from ([l4|, and ([Tt]) follows from ([IT]), and ([l3|. 

Let iV' :— maxjg|5j| iV^'. For all n > N' we have 



sup d(J^,P») < 2s{2 + \-^). 



(18) 



2. Take Xi 



X 



e ^2. Observe that by definition I D — 1} 7^ 



so that either i — 1 or z belong to I. We prove the statement for the case 
where i — I E I. The case where i e Z is analogous. We start by showing that 
C [tt — £, tt' + e] for all n > Nq where, 



2e 



TT := argmin — Itt^, — ^/'i-il and tt' TZ{tt). 



Since i — 1 G I, by (10 1 for all n > A'o we have ^[tt — V'i-il ^ ^- We have 



tt'\ < e, 



two cases. Either i E I so that by (10 1 for all n > Nq we have -^{ipi 
or i e I' in which case ipi < tt'. To see the latter statement assume by way 
of contradiction that ^pi > tt' where tt' ^ n; (the statement trivially holds for 
tt' = n). By the consistency of T there exists some j > i — \ E I such that 
i I V'j — 71"' I < e for all n> Nq. Thus from ( 10 ) and ( 12 ) we obtain that ipi — tpj > 
A — 2e > 0. Since the initial estimates are sorted in increasing order, this implies 
j < i leading to a contradiction. Thus we have [V'i-i, V'i] Q [tt — e,7r' + e] so 
that Pi — p where p is the process distribution p G {pi, . . . , Pr} that generates 
X7r..7r'- To show that d{xi,p) < e we proceed as follows. Let tt" := minj-^i, tt'}. 
It is easy to see that by (|5|, ([l2|, and the assumptions that Amin > and 
A G (0, Amin) the segment X7r..7r" has length at least nX. Therefore, for each 
TO,/G 1..N we can find a finite subset ofS™^' such that p(^™^') > 1-e. For 
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every B e Z?™'', m,l £N there exists some N'{B) such that for aU n > N'{B) 
we have 

\i^{X^+i„^„,B)- p{B)\<e. (19) 

For every B E B™-', m,l £ 1..T we have the following algebraic manipulation 
of v{xi,B). 

" II ¥r / ~. '1 ^i-m+l 



j=7r' + l 
^i— 1 — m + 1 



(20) 



< tt} "-A^^ I{^»-i>^}' V^TTfy cm 

j=i/>i_i+l ' J=ir + 1 



For aU B G m,Z G L.T and all n > maxjAfo, maxsg^™,;^ m,iei--T N' (B)} 

we have, 

V"; - ^Ai-i - m + 1 tt"- TT-m+l 

|!/(2;i,B) -p(B)l < ^ \iy{X^+i„^n,B) - p{B)\ 



n(ipi — tt") ^ n\ipi — 7r| 



+ , ^ + ," • , — ^ < 3eA~^ (21) 



where the first inequality follows from (20) and the second inequality follows 
from ([To]), ([TT]) and ([T9|. Let iVf := max{iVo, maxsg/}".,!, m,/ei..T ^ (^), 



For all n > N'/ we have 

T 



d[Xi,p)< Wm,i 2^ :7 J, \v{xi, B) - p{B)\ + 



< 2e{l + 2A 



Be/3" 

-In 



(22) 



2e 



where the first inequality follows from ([8|, the definition of /J™'' and observing 
that |i^(-, •) ~ P(')l — 1 ^-iid the second inequality foUows from([9|, ( [TT] ) and (21 ). 
Let iV" := max,:^^g5,(g) A^f . For n > N" we have 



sup d{xi,pi) < 2e(l + 2A"^). 



(23) 



Finally, by (18) and (23) for all n > max{A^', A^"} we have sup^. d{xi, pi) < 
2e(3+2A^^) Since e can be chosen arbitrary small, this proves the statement. □ 

Proof, (of Theorem [Tj) Let 6 := mmri:^r"ei..r d{pr' , Pr") denote the minimum 
distance between the distinct distributions that generate x. Fix an e G (0, 6/4). 
By Lemma [Tj and applying the triangle inequality there exists some Ni such 
that for all n > A^i we have 



^ d{xi,Xj) > 6 — 2e, and sup d{xi,Xj)<2e. (24) 
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Let TTfe := nOk, k = I.k. By the consistency of T (see Definition [s] and Propo- 
sition [TJ there exists some N2 such that for aU n > N2 there exists a set 
{/xi, . . . , Hk,} £ {L.k}™ such that 



n 



TTfcl < e. 



Let N := maxiV^, i = 1, 2. By (24) for aU n > iV we have 



J 



2..r, 



(25) 



(26) 



where Cj, j = l..r is given by ([T]). Hence, the cluster centers Xc^,j — l..r are 
each generated by a different process distribution. On the other hand, the rest 



of the segments are each assigned to the closest cluster, so that from (24 1 for all 
n > N we have 

T{x,)^T{x,,)^p,^p,,. (27) 

By construction the index-set C generated by Algorithm [T] corresponds to those 
and only those change-point candidates that separate consecutive segments as- 
signed to different clusters, by ( 27 ) for all n > TV and alH S C we have ^ Pi+i ■ 
Thus k — K and 9^ — ^i/'ai 
consistent estimates of tt^. 



fc = 1. 



Notice that by (25 1 V 



k — 1..K are 

□ 



6 Experimental Results 

In this section we present empirical evaluations of our algorithms on synthet- 
ically generated data. To generate the data we use stationary ergodic process 
distributions that do not belong to any "simpler" general class of time-series, 
and cannot be approximated by finite state models. In particular they cannot be 
modeled by hidden Markov process distributions with finite state-spaces. More- 
over, the single-dimensional marginals of all distributions are the same through- 
out the generated sequence. Similar distribution families are commonly used as 



examples in this framework (see, e.g.. Shields 1996). The distributions and the 



procedure to generate a sequence x := Xi, . . . , Xm S K™, m € N are as follows. 
Fix a parameter a G (0, 1) and two uniform distributions Ui and U2- Let tq be 
drawn randomly from [0, 1]. For each i = l..m obtain :— r^^i -fa mod 1 and 
draw from j = 1, 2. Finally set X^ := l{r, < 0.5}xf ^ + l{ri > 0.5}xfK 
If a is irrationaF] this produces a real-valued stationary ergodic time-series. 
In the experiments we fixed three parameters ai :— 0.12.., a2 '■= 0.13.. and 
as := 0.14.. (with long mantissae) to correspond to r = 3 different process 
distributions. To produce x E M" we randomly generated k := 5 change-points 
0k, k — 1..K at least Xaiin apart, with Xmin ■— O-l- Every segment of length 
nk '■= n{9k — Ok-i), k = 1..K-I-1 with Oq := 0, •= 1 was generated with ak' 

and Uk where fc' := k mod r, k ^ 0..K + 1. In our experiments we provide A := 
O.BAinin as input and calculate the error as I{|C| 7^ k}-|-I{|C| = k} J2k=i \^k — Sk\- 



is simulated by a long double with a long mantissa. 
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Sequence Length 



Figure 1: Average (over 40 runs) error rates of our algorithm 



CluBChaPo(a;, A, r) and the hst-estimator T of Khaleghi and Ryabko (2012a 



as a function of the length n of the input sequence x S M", where x has 
K — A change-points Amin :— 0.1 apart and is generated by r = 3 distributions; 
A := 0.6Amin. The error of T(a;, A) is based on its first k elements. 



Note that, with this data generation procedure, the single-dimensional marginals 
are the same throughout the sequence. Most of the existing algorithms do not 
work at all in this scenario. To the best of our knowledge, the only work to ad- 
dress the change-point problem under this general framework is that of |Khaleghi| 



and Ryabko (2012a), which we use here for comparison. However, this method 
is a list-estimator in the sense of Definition [3] and makes no attempt to estimate 
K. It simply generates a sorted list of estimates, whose first k elements converge 
to the true change-points; we calculate the error on the first k elements of its 
output. 



7 Discussion 

We have presented an asymptotically consistent method to estimate the number 
of change-points and do locate the changes in highly dependent time-series data. 
The considered framework is very general and as such is suitable for real-world 
applications. 

Note that in this setting rates of convergence (even of frequencies to respec- 
tive probabilities) are provably impossible to obtain. Therefore, unlike in the 
traditional settings for change-point analysis, the algorithms developed for this 
framework are forced not to rely on any rates of convergence. We see this as 
an advantage of the framework as it means that the algorithms are applicable 
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to a much wider range of situations. At the same time, it may be interesting 
to derive the rates of convergence of the proposed algorithm under stronger 
assumptions (e.g., i.i.d. data, or some mixing conditions). We conjecture that 
the algorithm is indeed optimal (up to some constant factors) in such settings 
as well (although it clearly cannot be optimal under parametric assumptions); 
however, we leave this as future work. 

In the proposed algorithm a specific consistent clustering method is used to 
estimate the number of change-points. An interesting extension would be to 
establish the consistency of this method using any list-estimator in combina- 
tion with any time-series clustering algorithm, that possess suitable asymptotic 
consistency guarantees. 

Finally, the consistency of the algorithm is established when the distribu- 
tional distance is used as the distance between the segments. The proof relies 
on some properties specific to this distance. Other distances can also be used 



in problems concerning stationary ergodic time series (e.g., Ryabko and Mary 



2012 ); thus, it is interesting to investigate which distances can be used with the 



algorithm proposed in the current paper. 
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